Data Preparation Part II

In this part we will study different aircrafts types Using PCA

ACP

the first variable was considered as a qualitative variable because it presents the different modalities of the international fleet

library(factoextra)
## Warning: package 'factoextra' was built under R version 3.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.3
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
library(FactoMineR)
data <-read.table(file = file.choose(), sep=";", header=TRUE , dec=".")
dataFLotte <- read.table(file = file.choose(), sep=";", header=TRUE , dec=".")
View(dataFLotte)
result_pca <- PCA(data, quali.sup=1)

eig.val <- get_eigenvalue(result_pca)
fviz_eig(result_pca, addlabels = TRUE, ylim = c(0, 50))

The correlation circle summarizes 65.2% of the total information only in the first and second axis if we include the 3rd axis we will have 80% of the total information

Projection on the first three axis:

result_pca <- PCA(data, quali.sup=1)

result_pca <- PCA(data, quali.sup=1,axes = 2:3)

result_pca <- PCA(data, quali.sup=1,axes =c(1,3))

library("corrplot")
## corrplot 0.84 loaded
var <- get_pca_var(result_pca)
corrplot(var$cos2, is.corr=FALSE)

fviz_cos2(result_pca, choice = "var", axes = 1:2)

fviz_cos2(result_pca, choice = "var", axes = 2:3)

fviz_cos2(result_pca, choice = "var", axes = c(3,1))

The plots showed us that there are 3 clouds of individuals Unfortunately the interpretation seems a little difficult given the large number of variables to be studied. For this reason, we will eliminate variables that are not relevant and have a poor representation. We will consider the individual VIP as additional (since it presents a unique case) as well as the varibales 2,3,4 (relating to this individual VIP) and 24,25,26,27,28,29,30,31 , 32,33,34 which are binary variables giving information on the types of classes in each airplane, this is redundancy because the names of the variables give information on the existing classes. EXample: seat.width..FirstClass

result=PCA(data,ind.sup =2,quanti.sup=c(2,3,4,24,25,26,27,28,29,30,31,32,33,34),quali.sup =1)

## Warning in arrows(0, 0, coord.quanti[q, 1], coord.quanti[q, 2], length =
## 0.1, : zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, coord.quanti[q, 1], coord.quanti[q, 2], length =
## 0.1, : zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, coord.quanti[q, 1], coord.quanti[q, 2], length =
## 0.1, : zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, coord.quanti[q, 1], coord.quanti[q, 2], length =
## 0.1, : zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, coord.quanti[q, 1], coord.quanti[q, 2], length =
## 0.1, : zero-length arrow is of indeterminate angle and so skipped

fviz_pca_var(result, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Évite le chevauchement de texte,
             
)

From a business point of view, we decided to eliminate some variables such as WingSpan, tail.Height, mileRange, Engines to have accurate results, that’s why we’ll eliminate them While flying over the data, we noticed that there are two aspects to study: all that is related to the places and their classes (Business, EcoComfort ..;) and the other options.

Studying the “Seats” in the different classes

Now , we are going to focus on the variables related to seats characteristics and try to reduce dimensionality

dataSeats=data.frame(dataFLotte$Aircraft,dataFLotte$Seats..First.Class.,dataFLotte$Seats..Business.,dataFLotte$Seats..Eco.Comfort.,dataFLotte$Seats..Economy.,dataFLotte$Accommodation)

result1=PCA(dataSeats,ind.sup =2,quali.sup =1)

fviz_pca_var(result1, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Évite le chevauchement de texte,
             
)

var <- get_pca_var(result1)
library("corrplot")
corrplot(var$cos2, is.corr=FALSE)

fviz_cos2(result1, choice = "var", axes = 1:2)


The accomodation variable informs us of the number of seats in each plane and the other variables (Seats first class, seats business, seats Economy and Seats economy comfort) provide information on the number of seats per class in the same plane. the circle in this case sums up more than 95% of the total information with a good quality of representation.

we note that:

-Accomodation, seats Eco comfort and Seats Economy are the most corrolated positively with the first axis

-Seats First class is corrolated positively with the second axis

on the other hand Seats business is negatively correlated with the second axis and positively with the first axis

and if we project the individuals, we obtain the following information:

-a first cloud positively corrolated with the first axis representing aircraft that specialize mainly in the classes Eco comfort, Economy and less importantly in Business class

EXAMPLE: individual 14 having 376 places distributed as follows:

Economy :286 places Eco Comfort :42 places Business :48 places

-A second cloud of points correlated positively with the axis 2 which represents the planes which have more places in First class taking the example of the individual 28 having the highest number of first class places: 36 places

_a third point cloud correlated negatively with the two axes having a small number of places compared to the other planes Example the individual 38 having a total of 28 places.

fviz_pca_ind(result1,
             geom.ind = "point", # Montre les points seulement (mais pas le "text")
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Ellipses de concentration
             legend.title = "Groups"
             
          
)

fviz_pca_biplot(result1, axes = c(1, 2),
    label = "all", invisible = "none", labelsize = 4, pointsize = 2)

Clustering :CAH and Kmeans

View(dataSeats)
boxplot(dataSeats[,2:5],names = c("First Class","business","Eco Comfort","Economy"))

#ScaledSeats=scale(dataSeats[,2:5], center = TRUE, scale = TRUE)
#boxplot(ScaledSeats[,],names = c("First Class","business","Eco #Comfort","Economy"))

1-CAH method :

#la matrice des distances entre tt les individus   
matrix=dist(dataSeats[,], method = "manhattan", diag = FALSE, upper = FALSE, p = 2)
## Warning in dist(dataSeats[, ], method = "manhattan", diag = FALSE, upper =
## FALSE, : NAs introduits lors de la conversion automatique
#clustering avec la methode compléte
h1=hclust(matrix, method = "complete", members = NULL)
plot(h1, main = "Dendrogramme avec methode d'agg complete")

#clustering avec la methode wadr D2
h2=hclust(matrix, method = "ward.D2", members = NULL)
plot(h1, main = "Dendrogramme avec methode ward D2")

we note that in the two segmentations CAH we found that the same element 14 is still an aberrated value this individuals is characterized by the largest number of seats dedicated to the economy class We will see the output of the segmentation by eliminating this element

#la matrice des distances entre tt les individus  
 
dataSeats1=dataSeats[-c(14),]
#la matrice des distances entre tt les individus   
matrix=dist(dataSeats1[,], method = "manhattan", diag = FALSE, upper = FALSE, p = 2)
## Warning in dist(dataSeats1[, ], method = "manhattan", diag = FALSE, upper =
## FALSE, : NAs introduits lors de la conversion automatique
#clustering avec la methode compléte
h1=hclust(matrix, method = "complete", members = NULL)
plot(h1, main = "Dendrogramme avec methode d'agg complete")

#clustering avec la methode wadr D2

h2=hclust(matrix, method = "ward.D2", members = NULL)
plot(h1, main = "Dendrogramme avec methode ward D2")


we had the same result by applying the method Ward D2 and the complete method

inertie <- sort(h2$height, decreasing = TRUE)
plot(inertie[1:10], type = "s", xlab = "Nombre de classes", 
     ylab = "Inertie")
points(c(2, 3, 4,6), inertie[c(2, 3,4,6)], col = c("green3", 
                                               "red3", "blue3","yellow3"), cex = 2, lwd = 3)

We can see that we can segment our data in 2, 3 or 4 clusters

groupes.cah <- cutree(h2,k=3 )
plot(groupes.cah)

barplot(table(groupes.cah))

dataSeatsClasseCAH3=data.frame(dataSeats1,groupes.cah)
classe1=dataSeatsClasseCAH3[which(dataSeatsClasseCAH3$groupes.cah==1),]
classe2=dataSeatsClasseCAH3[which(dataSeatsClasseCAH3$groupes.cah==2),]
classe3=dataSeatsClasseCAH3[which(dataSeatsClasseCAH3$groupes.cah==3),]
dataSeatsClasseCAH3
##             dataFLotte.Aircraft dataFLotte.Seats..First.Class.
## 1                   Airbus A319                             12
## 2               Airbus A319 VIP                             28
## 3                   Airbus A320                             12
## 4              Airbus A320 32-R                             12
## 5               Airbus A330-200                              0
## 6         Airbus A330-200 (3L2)                              0
## 7         Airbus A330-200 (3L3)                              0
## 8               Airbus A330-300                              0
## 9                    Boeing 717                             12
## 10         Boeing 737-700 (73W)                             12
## 11         Boeing 737-800 (738)                             16
## 12         Boeing 737-800 (73H)                             16
## 13       Boeing 737-900ER (739)                             20
## 15         Boeing 757-200 (75A)                             24
## 16         Boeing 757-200 (75E)                              0
## 17         Boeing 757-200 (75M)                             22
## 18         Boeing 757-200 (75N)                             22
## 19         Boeing 757-200 (757)                             24
## 20         Boeing 757-200 (75V)                             22
## 21         Boeing 757-200 (75X)                              0
## 22               Boeing 757-300                             24
## 23         Boeing 767-300 (76G)                              0
## 24         Boeing 767-300 (76L)                              0
## 25         Boeing 767-300 (76P)                             30
## 26         Boeing 767-300 (76Q)                             30
## 27         Boeing 767-300 (76T)                              0
## 28         Boeing 767-300 (76U)                             36
## 29     Boeing 767-300 (76Z V.1)                              0
## 30     Boeing 767-300 (76Z V.2)                              0
## 31         Boeing 767-400 (76D)                              0
## 32             Boeing 777-200ER                              0
## 33             Boeing 777-200LR                              0
## 34 CRJ 100/200 Pinnacle/SkyWest                              0
## 35       CRJ 100/200 ExpressJet                              0
## 36                      CRJ 700                              9
## 37                      CRJ 900                             12
## 38                         E120                              0
## 39                         E170                              9
## 40                         E175                             12
## 41                      ERJ-145                              0
## 42                        MD-88                             16
## 43                        MD-90                             16
## 44                    MD-DC9-50                             16
##    dataFLotte.Seats..Business. dataFLotte.Seats..Eco.Comfort.
## 1                            0                             18
## 2                           14                              0
## 3                            0                             18
## 4                            0                             18
## 5                           32                             30
## 6                           34                             32
## 7                           34                             32
## 8                           34                             32
## 9                            0                             15
## 10                           0                             18
## 11                           0                             18
## 12                           0                             18
## 13                           0                             21
## 15                           0                             18
## 16                          16                             25
## 17                           0                             18
## 18                           0                             19
## 19                           0                             20
## 20                           0                             21
## 21                          26                             26
## 22                           0                             23
## 23                          30                             38
## 24                          36                             32
## 25                           0                             28
## 26                           0                             28
## 27                          36                             29
## 28                           0                             31
## 29                          26                             35
## 30                          26                             29
## 31                          40                             28
## 32                          45                             36
## 33                          45                             36
## 34                           0                              0
## 35                           0                              0
## 36                           0                              8
## 37                           0                             12
## 38                           0                              0
## 39                           0                             12
## 40                           0                             12
## 41                           0                              0
## 42                           0                             15
## 43                           0                             15
## 44                           0                             14
##    dataFLotte.Seats..Economy. dataFLotte.Accommodation groupes.cah
## 1                          96                      126           1
## 2                           0                       54           2
## 3                         120                      150           1
## 4                         120                      150           1
## 5                         181                      243           3
## 6                         168                      243           3
## 7                         227                      293           3
## 8                         232                      298           3
## 9                          83                      110           1
## 10                         94                      124           1
## 11                        126                      124           1
## 12                        126                      160           1
## 13                        139                      180           1
## 15                        132                      174           1
## 16                        130                      171           1
## 17                        141                      181           1
## 18                        141                      182           1
## 19                        136                      180           1
## 20                        132                      175           1
## 21                        132                      184           1
## 22                        177                      224           1
## 23                        140                      208           1
## 24                        143                      211           1
## 25                        203                      261           3
## 26                        203                      261           3
## 27                        143                      208           1
## 28                        143                      210           1
## 29                        171                      226           3
## 30                        171                      226           3
## 31                        178                      246           3
## 32                        188                      269           3
## 33                        188                      269           3
## 34                         50                       50           2
## 35                         50                       50           2
## 36                         48                       65           2
## 37                         52                       76           2
## 38                         28                       28           2
## 39                         48                       69           2
## 40                         52                       76           2
## 41                         50                       50           2
## 42                        118                      149           1
## 43                        129                      160           1
## 44                         90                      120           1

number of seats in first class ;clustering with CAH 3 clusters

boxplot(classe1$dataFLotte.Seats..First.Class.,classe2$dataFLotte.Seats..First.Class.,classe3$dataFLotte.Seats..First.Class.)

number of seats in Business class ;clustering with CAH 3 clusters

boxplot(classe1$dataFLotte.Seats..Business.,classe2$dataFLotte.Seats..Business.,classe3$dataFLotte.Seats..Business.)

number of seats in Eco.Comfort class ;clustering with CAH 3 clusters

boxplot(classe1$dataFLotte.Seats..Eco.Comfort.,classe2$dataFLotte.Seats..Eco.Comfort.,classe3$dataFLotte.Seats..Eco.Comfort.)

number of seats in Economy class ;clustering with CAH 3 clusters

boxplot(classe1$dataFLotte.Seats..Economy.,classe2$dataFLotte.Seats..Economy.,classe3$dataFLotte.Seats..Economy.)

groupes.cah <- cutree(h2,k=4 )
plot(groupes.cah)

barplot(table(groupes.cah))

dataSeatsClasseCAH4=data.frame(dataSeats1,groupes.cah)
classe1=dataSeatsClasseCAH4[which(dataSeatsClasseCAH4$groupes.cah==1),]
classe2=dataSeatsClasseCAH4[which(dataSeatsClasseCAH4$groupes.cah==2),]
classe3=dataSeatsClasseCAH4[which(dataSeatsClasseCAH4$groupes.cah==3),]
classe4=dataSeatsClasseCAH4[which(dataSeatsClasseCAH4$groupes.cah==4),]
dataSeatsClasseCAH4
##             dataFLotte.Aircraft dataFLotte.Seats..First.Class.
## 1                   Airbus A319                             12
## 2               Airbus A319 VIP                             28
## 3                   Airbus A320                             12
## 4              Airbus A320 32-R                             12
## 5               Airbus A330-200                              0
## 6         Airbus A330-200 (3L2)                              0
## 7         Airbus A330-200 (3L3)                              0
## 8               Airbus A330-300                              0
## 9                    Boeing 717                             12
## 10         Boeing 737-700 (73W)                             12
## 11         Boeing 737-800 (738)                             16
## 12         Boeing 737-800 (73H)                             16
## 13       Boeing 737-900ER (739)                             20
## 15         Boeing 757-200 (75A)                             24
## 16         Boeing 757-200 (75E)                              0
## 17         Boeing 757-200 (75M)                             22
## 18         Boeing 757-200 (75N)                             22
## 19         Boeing 757-200 (757)                             24
## 20         Boeing 757-200 (75V)                             22
## 21         Boeing 757-200 (75X)                              0
## 22               Boeing 757-300                             24
## 23         Boeing 767-300 (76G)                              0
## 24         Boeing 767-300 (76L)                              0
## 25         Boeing 767-300 (76P)                             30
## 26         Boeing 767-300 (76Q)                             30
## 27         Boeing 767-300 (76T)                              0
## 28         Boeing 767-300 (76U)                             36
## 29     Boeing 767-300 (76Z V.1)                              0
## 30     Boeing 767-300 (76Z V.2)                              0
## 31         Boeing 767-400 (76D)                              0
## 32             Boeing 777-200ER                              0
## 33             Boeing 777-200LR                              0
## 34 CRJ 100/200 Pinnacle/SkyWest                              0
## 35       CRJ 100/200 ExpressJet                              0
## 36                      CRJ 700                              9
## 37                      CRJ 900                             12
## 38                         E120                              0
## 39                         E170                              9
## 40                         E175                             12
## 41                      ERJ-145                              0
## 42                        MD-88                             16
## 43                        MD-90                             16
## 44                    MD-DC9-50                             16
##    dataFLotte.Seats..Business. dataFLotte.Seats..Eco.Comfort.
## 1                            0                             18
## 2                           14                              0
## 3                            0                             18
## 4                            0                             18
## 5                           32                             30
## 6                           34                             32
## 7                           34                             32
## 8                           34                             32
## 9                            0                             15
## 10                           0                             18
## 11                           0                             18
## 12                           0                             18
## 13                           0                             21
## 15                           0                             18
## 16                          16                             25
## 17                           0                             18
## 18                           0                             19
## 19                           0                             20
## 20                           0                             21
## 21                          26                             26
## 22                           0                             23
## 23                          30                             38
## 24                          36                             32
## 25                           0                             28
## 26                           0                             28
## 27                          36                             29
## 28                           0                             31
## 29                          26                             35
## 30                          26                             29
## 31                          40                             28
## 32                          45                             36
## 33                          45                             36
## 34                           0                              0
## 35                           0                              0
## 36                           0                              8
## 37                           0                             12
## 38                           0                              0
## 39                           0                             12
## 40                           0                             12
## 41                           0                              0
## 42                           0                             15
## 43                           0                             15
## 44                           0                             14
##    dataFLotte.Seats..Economy. dataFLotte.Accommodation groupes.cah
## 1                          96                      126           1
## 2                           0                       54           2
## 3                         120                      150           1
## 4                         120                      150           1
## 5                         181                      243           3
## 6                         168                      243           3
## 7                         227                      293           3
## 8                         232                      298           3
## 9                          83                      110           1
## 10                         94                      124           1
## 11                        126                      124           1
## 12                        126                      160           1
## 13                        139                      180           4
## 15                        132                      174           4
## 16                        130                      171           4
## 17                        141                      181           4
## 18                        141                      182           4
## 19                        136                      180           4
## 20                        132                      175           4
## 21                        132                      184           4
## 22                        177                      224           4
## 23                        140                      208           4
## 24                        143                      211           4
## 25                        203                      261           3
## 26                        203                      261           3
## 27                        143                      208           4
## 28                        143                      210           4
## 29                        171                      226           3
## 30                        171                      226           3
## 31                        178                      246           3
## 32                        188                      269           3
## 33                        188                      269           3
## 34                         50                       50           2
## 35                         50                       50           2
## 36                         48                       65           2
## 37                         52                       76           2
## 38                         28                       28           2
## 39                         48                       69           2
## 40                         52                       76           2
## 41                         50                       50           2
## 42                        118                      149           1
## 43                        129                      160           1
## 44                         90                      120           1

number of seats in first class ;clustering CAH with 4 class

boxplot(classe1$dataFLotte.Seats..First.Class.,classe2$dataFLotte.Seats..First.Class.,classe3$dataFLotte.Seats..First.Class.,classe4$dataFLotte.Seats..First.Class.)

number of seats in Business class;clustering CAH with 4 class

boxplot(classe1$dataFLotte.Seats..Business.,classe2$dataFLotte.Seats..Business.,classe3$dataFLotte.Seats..Business.,classe4$dataFLotte.Seats..Business.)

number of seats in Eco.Comfort class;clustering CAH with 4 class

boxplot(classe1$dataFLotte.Seats..Eco.Comfort.,classe2$dataFLotte.Seats..Eco.Comfort.,classe3$dataFLotte.Seats..Eco.Comfort.,classe4$dataFLotte.Seats..Eco.Comfort.)

number of seats in Economy class;clustering CAH with 4 class

boxplot(classe1$dataFLotte.Seats..Economy.,classe2$dataFLotte.Seats..Economy.,classe3$dataFLotte.Seats..Economy.,classe4$dataFLotte.Seats..Economy.)

now project the classes on the factorial plane using PCA to have more information of caracteristics of our clusters

res <- HCPC(result1,graph = FALSE)
plot(res, choice = "3D.map")

plot(res, choice = "tree")

plot(res, choice = "bar")

plot(res, choice = "map")

fviz_dend(res, 
          cex = 0.7,                     # Taille du text
          palette = "jco",               # Palette de couleur ?ggpubr::ggpar
          rect = TRUE, rect_fill = TRUE, # Rectangle autour des groupes
          rect_border = "jco",           # Couleur du rectangle
          labels_track_height = 0.8 )

fviz_cluster(res,
            repel = TRUE,            # Evite le chevauchement des textes
            show.clust.cent = TRUE, # Montre le centre des clusters
            palette = "jco",         # Palette de couleurs, voir ?ggpubr::ggpar
            ggtheme = theme_minimal(),
            main = "Factor map"
)

Kmeans method :

result=kmeans(dataSeats[,2:6],3)
result
## K-means clustering with 3 clusters of sizes 10, 13, 21
## 
## Cluster means:
##   dataFLotte.Seats..First.Class. dataFLotte.Seats..Business.
## 1                       8.200000                    1.400000
## 2                       6.461538                   28.000000
## 3                      14.190476                    6.857143
##   dataFLotte.Seats..Eco.Comfort. dataFLotte.Seats..Economy.
## 1                        5.90000                    46.1000
## 2                       31.61538                   197.9231
## 3                       21.42857                   127.1905
##   dataFLotte.Accommodation
## 1                  62.8000
## 2                 264.2308
## 3                 167.9524
## 
## Clustering vector:
##  [1] 3 1 3 3 2 2 2 2 1 3 3 3 3 2 3 3 3 3 3 3 3 2 3 3 2 2 3 3 2 2 2 2 2 1 1
## [36] 1 1 1 1 1 1 3 3 3
## 
## Within cluster sum of squares by cluster:
## [1]  9591.40 39501.54 28307.14
##  (between_SS / total_SS =  82.7 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"
seatsclasse=data.frame(dataSeats,result$cluster)
View(seatsclasse)
classe1=seatsclasse[which(seatsclasse$result.cluster==1),]
classe2=seatsclasse[which(seatsclasse$result.cluster==2),]
classe3=seatsclasse[which(seatsclasse$result.cluster==3),]

number of seats in first class

boxplot(classe1$dataFLotte.Seats..First.Class.,classe2$dataFLotte.Seats..First.Class.,classe3$dataFLotte.Seats..First.Class.)

number of seats in business class

boxplot(classe1$dataFLotte.Seats..Business.,classe2$dataFLotte.Seats..Business.,classe3$dataFLotte.Seats..Business.)

number of seats in Eco Comfort

boxplot(classe1$dataFLotte.Seats..Eco.Comfort.,classe2$dataFLotte.Seats..Eco.Comfort.,classe3$dataFLotte.Seats..Eco.Comfort.)

number of seats in Economic class

boxplot(classe1$dataFLotte.Seats..Economy.,classe2$dataFLotte.Seats..Economy.,classe3$dataFLotte.Seats..Economy.)

number of total seats for each cluster

boxplot(classe1$dataFLotte.Accommodation,classe2$dataFLotte.Accommodation,classe3$dataFLotte.Accommodation)

this segmentation reminds us of the one made by PCA

seatsclasse[,7]=as.factor(seatsclasse[,7])
result2=PCA(seatsclasse,ind.sup =2,quali.sup =c(1,7))

plot.PCA(result2,choix="ind",habillage=7)

## PCA about secondary options :

library(factoextra)
library(FactoMineR)

#data:
dataFLotteopt = read.table(file = file.choose(), sep=";", header=TRUE , dec=".")

View(dataFLotteopt)
summary(dataFLotteopt)
##                   Aircraft  Seat.Width..Club. Seat.Pitch..Club.
##  Airbus A319          : 1   Min.   : 0.0000   Min.   : 0       
##  Airbus A319 VIP      : 1   1st Qu.: 0.0000   1st Qu.: 0       
##  Airbus A320          : 1   Median : 0.0000   Median : 0       
##  Airbus A320 32-R     : 1   Mean   : 0.4409   Mean   : 1       
##  Airbus A330-200      : 1   3rd Qu.: 0.0000   3rd Qu.: 0       
##  Airbus A330-200 (3L2): 1   Max.   :19.4000   Max.   :44       
##  (Other)              :38                                      
##   Seat..Club.      Seat.Width..First.Class. Seat.Pitch..First.Class.
##  Min.   : 0.0000   Min.   : 0.00            Min.   : 0.00           
##  1st Qu.: 0.0000   1st Qu.: 0.00            1st Qu.: 0.00           
##  Median : 0.0000   Median :18.95            Median :36.00           
##  Mean   : 0.2727   Mean   :11.49            Mean   :21.85           
##  3rd Qu.: 0.0000   3rd Qu.:21.00            3rd Qu.:37.00           
##  Max.   :12.0000   Max.   :21.00            Max.   :60.00           
##                                                                     
##  Seats..First.Class. Seat.Width..Business. Seat.Pitch..Business.
##  Min.   : 0.00       Min.   : 0.000        Min.   : 0.00        
##  1st Qu.: 0.00       1st Qu.: 0.000        1st Qu.: 0.00        
##  Median :12.00       Median : 0.000        Median : 0.00        
##  Mean   :10.55       Mean   : 7.545        Mean   :25.55        
##  3rd Qu.:17.00       3rd Qu.:21.000        3rd Qu.:60.00        
##  Max.   :36.00       Max.   :21.000        Max.   :82.00        
##                                                                 
##  Seats..Business. Seat.Width..Eco.Comfort. Seat.Pitch..Eco.Comfort.
##  Min.   : 0.00    Min.   : 0.00            Min.   : 0.00           
##  1st Qu.: 0.00    1st Qu.:17.20            1st Qu.:34.00           
##  Median : 0.00    Median :17.30            Median :34.00           
##  Mean   :11.86    Mean   :15.65            Mean   :30.58           
##  3rd Qu.:27.00    3rd Qu.:18.00            3rd Qu.:35.00           
##  Max.   :48.00    Max.   :18.50            Max.   :36.00           
##                                                                    
##  Seats..Eco.Comfort. Seat.Width..Economy. Seat.Pitch..Economy.
##  Min.   : 0.00       Min.   : 0.00        Min.   : 0.00       
##  1st Qu.:15.00       1st Qu.:17.20        1st Qu.:30.50       
##  Median :19.50       Median :17.60        Median :31.00       
##  Mean   :20.91       Mean   :17.25        Mean   :30.41       
##  3rd Qu.:29.25       3rd Qu.:18.00        3rd Qu.:31.50       
##  Max.   :42.00       Max.   :18.50        Max.   :32.00       
##                                                               
##  Seats..Economy. Accommodation   Cruising.Speed..mph. Range..miles.  
##  Min.   :  0.0   Min.   : 28.0   Min.   :364.0        Min.   :  731  
##  1st Qu.: 93.0   1st Qu.:123.0   1st Qu.:517.0        1st Qu.: 1944  
##  Median :132.0   Median :177.5   Median :517.0        Median : 3174  
##  Mean   :129.7   Mean   :172.5   Mean   :515.8        Mean   : 3869  
##  3rd Qu.:171.0   3rd Qu.:226.0   3rd Qu.:517.0        3rd Qu.: 6221  
##  Max.   :286.0   Max.   :376.0   Max.   :564.0        Max.   :10375  
##                                                                      
##     Engines      Wingspan..ft.    Tail.Height..ft.  Length..ft.    
##  Min.   :2.000   Min.   : 64.92   Min.   :20.42    Min.   : 65.58  
##  1st Qu.:2.000   1st Qu.:107.83   1st Qu.:32.23    1st Qu.:116.94  
##  Median :2.000   Median :129.79   Median :44.50    Median :155.25  
##  Mean   :2.045   Mean   :132.10   Mean   :42.57    Mean   :149.94  
##  3rd Qu.:2.000   3rd Qu.:156.08   3rd Qu.:52.00    3rd Qu.:180.25  
##  Max.   :4.000   Max.   :213.00   Max.   :62.54    Max.   :231.83  
##                                                                    
##       Wifi            Video            Power          Satellite     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.0000   Median :1.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.5682   Mean   :0.6136   Mean   :0.5682   Mean   :0.1364  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##                                                                     
##     Flat.bed         Sleeper            Club          First.Class    
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.00000   Median :1.0000  
##  Mean   :0.2045   Mean   :0.1364   Mean   :0.02273   Mean   :0.5455  
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.0000  
##                                                                      
##     Business       Eco.Comfort        Economy      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:1.0000  
##  Median :0.0000   Median :1.0000   Median :1.0000  
##  Mean   :0.3864   Mean   :0.8864   Mean   :0.9773  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
## 
dataopt=data.frame(dataFLotteopt$Aircraft,dataFLotteopt$Length..ft.,dataFLotteopt$Wifi,dataFLotteopt$Video,dataFLotteopt$Power,dataFLotteopt$Satellite,dataFLotteopt$Flat.bed)
View(dataopt)
result7=PCA(dataopt,quali.sup =1)

inertia distribution :

eig.val <- get_eigenvalue(result7)
fviz_eig(result7, addlabels = TRUE, ylim = c(0, 50))

#correlation of variables (opt) :

fviz_pca_var(result7, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Évite le chevauchement de texte,
             
)

#projection quality of options in dim 1 and 2:
fviz_cos2(result7, choice = "var", axes = 1:2)

#plot of airline:
fviz_pca_ind(result7,
             geom.ind = "point", # Montre les points seulement (mais pas le "text")
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Ellipses de concentration
             legend.title = "Groups"
             
             
)

the following graph shows us: * satellite option is strongly positively correlated with 2nd axis * options: power, flatbed video and airplane length are positively correlated with 1st axis * wifi option is positively correlated with 2nd axis and negatively with the first This is 3 remarks help us to distinguish the following results: * The aircraft 10,12,21 and 26 (Boeing 737-700 (73W), Boeing 737-800 (73H), Boeing 757-200 (75X), Boeing 767-300 (76Q)), generally the boeing plane, contain this options among other aircraft For example, aircraft number 14 (Boeing 747-400 (74S)) contain most of the options (power, length, flatbed, video) * also planes 38 (E120) and 41 (ERJ-145) do not contain any options (they are distorted for economy class)

projection of individuals(airlines):

fviz_pca_biplot(result7, axes = c(1, 2),
                label = "all", invisible = "none", labelsize = 4, pointsize = 2)

numbers of planes by options:

dataopt=as.matrix(dataopt)
barplot(dataopt[,3:7],names = c("wifi","video","power","satellite","flat-bed"),main="nombres des avions par options",xlab="les options",ylab="nombre des avions")

fleet segmentation with options of airline:

1/db-scan:Density-based spatial clustering of applications with noise:

dataopt=as.data.frame(dataopt)
str(dataopt)
## 'data.frame':    44 obs. of  7 variables:
##  $ dataFLotteopt.Aircraft   : Factor w/ 44 levels "Airbus A319",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ dataFLotteopt.Length..ft.: Factor w/ 24 levels " 65.583"," 87.830",..: 8 8 11 11 20 20 22 22 10 7 ...
##  $ dataFLotteopt.Wifi       : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 1 2 2 ...
##  $ dataFLotteopt.Video      : Factor w/ 2 levels "0","1": 1 2 1 1 2 2 2 2 1 2 ...
##  $ dataFLotteopt.Power      : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 2 2 ...
##  $ dataFLotteopt.Satellite  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
##  $ dataFLotteopt.Flat.bed   : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 2 1 1 1 ...
dataopt$dataFLotteopt.Length..ft.=as.numeric(dataopt$dataFLotteopt.Length..ft.)
dataopt$dataFLotteopt.Wifi=as.numeric(dataopt$dataFLotteopt.Wifi)
dataopt$dataFLotteopt.Video=as.numeric(dataopt$dataFLotteopt.Video)
dataopt$dataFLotteopt.Power=as.numeric(dataopt$dataFLotteopt.Power)
dataopt$dataFLotteopt.Satellite=as.numeric(dataopt$dataFLotteopt.Satellite)
dataopt$dataFLotteopt.Flat.bed=as.numeric(dataopt$dataFLotteopt.Flat.bed)
str(dataopt)
## 'data.frame':    44 obs. of  7 variables:
##  $ dataFLotteopt.Aircraft   : Factor w/ 44 levels "Airbus A319",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ dataFLotteopt.Length..ft.: num  8 8 11 11 20 20 22 22 10 7 ...
##  $ dataFLotteopt.Wifi       : num  2 2 2 2 1 1 1 1 2 2 ...
##  $ dataFLotteopt.Video      : num  1 2 1 1 2 2 2 2 1 2 ...
##  $ dataFLotteopt.Power      : num  1 1 1 1 2 2 2 2 2 2 ...
##  $ dataFLotteopt.Satellite  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ dataFLotteopt.Flat.bed   : num  1 1 1 1 2 1 2 1 1 1 ...
d=dataopt[,2:7]
library("dbscan")
## Warning: package 'dbscan' was built under R version 3.4.3
library("ANN2")
## Warning: package 'ANN2' was built under R version 3.4.3
library("ggplot2")

#run dbscan
db <- dbscan(d, eps = 1, minPts = 3)
db
## DBSCAN clustering for 44 objects.
## Parameters: eps = 1, minPts = 3
## The clustering contains 6 cluster(s) and 10 noise points.
## 
##  0  1  2  3  4  5  6 
## 10  3  3 14  7  4  3 
## 
## Available fields: cluster, eps, minPts
#Scatterplot Matrices entre les options
plot(d, col=db$cluster)

hullplot(d, db)

#6 clusters
dataOption=data.frame(d,db$cluster)

classe1=dataOption[which(dataOption$db.cluster==1),]
classe2=dataOption[which(dataOption$db.cluster==2),]
classe3=dataOption[which(dataOption$db.cluster==3),]
classe4=dataOption[which(dataOption$db.cluster==4),]
classe5=dataOption[which(dataOption$db.cluster==5),]
classe6=dataOption[which(dataOption$db.cluster==6),]
dataOption
##    dataFLotteopt.Length..ft. dataFLotteopt.Wifi dataFLotteopt.Video
## 1                          8                  2                   1
## 2                          8                  2                   2
## 3                         11                  2                   1
## 4                         11                  2                   1
## 5                         20                  1                   2
## 6                         20                  1                   2
## 7                         22                  1                   2
## 8                         22                  1                   2
## 9                         10                  2                   1
## 10                         7                  2                   2
## 11                         7                  2                   2
## 12                        12                  2                   2
## 13                        14                  2                   2
## 14                        24                  1                   2
## 15                        17                  1                   2
## 16                        17                  2                   2
## 17                        17                  2                   1
## 18                        17                  2                   1
## 19                        11                  2                   2
## 20                        17                  2                   2
## 21                        17                  2                   2
## 22                        18                  2                   2
## 23                        19                  1                   2
## 24                        19                  1                   2
## 25                        19                  2                   2
## 26                        19                  2                   2
## 27                        19                  1                   2
## 28                        19                  1                   2
## 29                        19                  1                   2
## 30                        19                  1                   2
## 31                        21                  1                   2
## 32                        23                  1                   2
## 33                        23                  1                   2
## 34                         2                  1                   1
## 35                         2                  1                   1
## 36                         6                  2                   1
## 37                         9                  2                   1
## 38                         1                  1                   1
## 39                         4                  2                   1
## 40                         5                  2                   1
## 41                         3                  1                   1
## 42                        15                  2                   1
## 43                        16                  2                   1
## 44                        13                  2                   1
##    dataFLotteopt.Power dataFLotteopt.Satellite dataFLotteopt.Flat.bed
## 1                    1                       1                      1
## 2                    1                       1                      1
## 3                    1                       1                      1
## 4                    1                       1                      1
## 5                    2                       1                      2
## 6                    2                       1                      1
## 7                    2                       1                      2
## 8                    2                       1                      1
## 9                    2                       1                      1
## 10                   2                       2                      1
## 11                   2                       2                      1
## 12                   2                       2                      1
## 13                   2                       1                      1
## 14                   2                       1                      2
## 15                   1                       1                      1
## 16                   2                       1                      1
## 17                   1                       1                      1
## 18                   1                       1                      1
## 19                   1                       1                      1
## 20                   1                       1                      1
## 21                   2                       2                      1
## 22                   1                       1                      1
## 23                   2                       1                      1
## 24                   2                       1                      2
## 25                   2                       2                      1
## 26                   2                       2                      1
## 27                   2                       1                      2
## 28                   2                       1                      1
## 29                   2                       1                      2
## 30                   2                       1                      1
## 31                   2                       1                      2
## 32                   2                       1                      2
## 33                   2                       1                      2
## 34                   1                       1                      1
## 35                   1                       1                      1
## 36                   1                       1                      1
## 37                   1                       1                      1
## 38                   1                       1                      1
## 39                   1                       1                      1
## 40                   1                       1                      1
## 41                   1                       1                      1
## 42                   2                       1                      1
## 43                   2                       1                      1
## 44                   1                       1                      1
##    db.cluster
## 1           1
## 2           1
## 3           2
## 4           2
## 5           3
## 6           3
## 7           3
## 8           3
## 9           0
## 10          0
## 11          0
## 12          0
## 13          0
## 14          3
## 15          4
## 16          4
## 17          4
## 18          4
## 19          2
## 20          4
## 21          4
## 22          4
## 23          3
## 24          3
## 25          0
## 26          0
## 27          3
## 28          3
## 29          3
## 30          3
## 31          3
## 32          3
## 33          3
## 34          5
## 35          5
## 36          6
## 37          1
## 38          5
## 39          6
## 40          6
## 41          5
## 42          0
## 43          0
## 44          0
View(dataOption)

option length with 6 clusters of db-scan

boxplot(classe1$dataFLotteopt.Length..ft.,classe2$dataFLotteopt.Length..ft.,classe3$dataFLotteopt.Length..ft.,classe4$dataFLotteopt.Length..ft.,classe5$dataFLotteopt.Length..ft.,classe6$dataFLotteopt.Length..ft.,col="violet")

option wifi with 6 clusters of db-scan

boxplot(classe1$dataFLotteopt.Wifi,classe2$dataFLotteopt.Wifi,classe3$dataFLotteopt.Length..ft.,classe4$dataFLotteopt.Wifi,classe5$dataFLotteopt.Wifi,classe6$dataFLotteopt.Wifi,col="blue")

option video with 6 clusters of db-scan

boxplot(classe1$dataFLotteopt.Video,classe2$dataFLotteopt.Video,classe3$dataFLotteopt.Video,classe4$dataFLotteopt.Video,classe5$dataFLotteopt.Video,classe6$dataFLotteopt.Video,col="red")

option power with 6 clusters of db-scan

boxplot(classe1$dataFLotteopt.Power,classe2$dataFLotteopt.Power,classe3$dataFLotteopt.Power,classe4$dataFLotteopt.Power,classe5$dataFLotteopt.Power,classe6$dataFLotteopt.Power.Video,col="yellow")
## Warning in is.na(x): is.na() appliqué à un objet de type 'NULL' qui n'est
## ni une liste, ni un vecteur

## Warning in is.na(x): is.na() appliqué à un objet de type 'NULL' qui n'est
## ni une liste, ni un vecteur

## Warning in is.na(x): is.na() appliqué à un objet de type 'NULL' qui n'est
## ni une liste, ni un vecteur

option satellite with 6 clusters of db-scan

boxplot(classe1$dataFLotteopt.Satellite,classe1$dataFLotteopt.Satellite,classe1$dataFLotteopt.Satellite,classe1$dataFLotteopt.Satellite,classe1$dataFLotteopt.Satellite,classe1$dataFLotteopt.Satellite,col="red3")

noise points are those which represent the planes seen in acp of which they are correlated with the 2nd axis and contain the option satellite Example: airline 10 and 12 with type Boeing

option flatbed with 6 clusters of db-scan

boxplot(classe1$dataFLotteopt.Flat.bed,classe2$dataFLotteopt.Flat.bed,classe3$dataFLotteopt.Flat.bed,classe4$dataFLotteopt.Flat.bed,classe5$dataFLotteopt.Flat.bed,classe6$dataFLotteopt.Flat.bed,col="grey")

Conclusion:

  • cluster1 (green): these planes are generally equipped with wifi and video options and they are medium-sized planes because of their length compared to the planes of the other segments, for example vions 13: Boeing 737- 900ER (739) and 16: Boeing 757-200 (75E) that exists in the center of plan (1,2) in the ACP

*cluster2 (red): these planes are larger than clluster1 but they are medium-sized remains, they are characterized by the existence of the wifi option as aircraft number 22: Boeing 757-300

  • cluster3 (blue): it represents the largest planes of pt of view size, also the planes which are well equipped by the totality of the options, it is represented in blue in the graph of db-scan and granted corollated planes to the first axis in acp as aircraft 14: Boeing 747-400 (74S)

*cluster4 (sky blue):represents the moderately large size class (close to cluster3 size) also the most equipped aircraft by video option like aircraft number 8 Airbus A330-300

*cluster5 (pink): represents smaller planes that are not equipped with any option (are the planes designed for economy class) such as aircraft number 35: CRJ 100/200 ExpressJet , 38: E120 and 41: ERJ-145.

  • cluster6 (gray): represents airplanes of the eco-comfort class generally with some space for the first class so they are equipped with the option wifi and a size larger than those of class planes economic

*the noise points: represent the well-equipped planes by the satellite option like the plane 10 and 12

Studing Other options in differents clusters with CA:

to study other aspects of the world fleet we used another richer database

allplaines=read.table(file = file.choose(),header = TRUE,dec = ".", sep = ",")
head(allplaines)
##           Airline      AircraftWithSeatmap SeatPitch SeatWidth
## 1 Aegean Airlines        Airbus A321 (321)      33.0        18
## 2      Aer Lingus Airbus A330-200 (332) V1      58.0        21
## 3      Aer Lingus    Airbus A330-300 (333)      58.0        21
## 4      Aer Lingus     Boeing 757-200 (752)      60.0        22
## 5      Aer Lingus Airbus A330-200 (332) V2      58.0        21
## 6        Aeroflot Airbus A330-300 (333) V1      60.2        26
##      VideoType LaptopPower PowerType Wi.Fi       SeatType      Class
## 1  Overhead TV        None      None    No       Standard LHBusiness
## 2 On-Demand TV   All Seats  AC Power   Yes       Flat Bed LHBusiness
## 3 On-Demand TV   All Seats  AC Power   Yes       Flat Bed LHBusiness
## 4 On-Demand TV   All Seats  AC Power    No       Flat Bed LHBusiness
## 5 On-Demand TV   All Seats  AC Power   Yes       Flat Bed LHBusiness
## 6 On-Demand TV   All Seats  AC Power   Yes Angle Lie Flat LHBusiness

We will opt in this part to do the CA to release the average profile of the world fleet and to make it we transformed our base into a table of contingencies

feqTab=read.table(file = file.choose(),header = TRUE,dec = ".",sep = ",")
head(feqTab)
##   X              airlines videoNonelist videoOnDemandlist
## 1 1       Aegean Airlines             5                 0
## 2 2            Aer Lingus             2                 8
## 3 3              Aeroflot             6                12
## 4 4 Aerolineas Argentinas             4                 0
## 5 5            Aeromexico             5                14
## 6 6            Air Canada             5                41
##   videoOverHeadlist videoPortablelist videoSatellitelist videoSeatbacklist
## 1                 4                 0                  0                 0
## 2                 0                 0                  0                 0
## 3                 0                 6                  0                 1
## 4                 1                 0                  0                 3
## 5                 3                 0                  0                 0
## 6                 0                 9                  0                 0
##   powerAllSeatslist powersomeSeatslist powernonelist WifiNolist
## 1                 0                  0             9          9
## 2                 7                  0             3          4
## 3                16                  0             9         14
## 4                 0                  0             8          6
## 5                 3                  0            19         14
## 6                33                 13             9         48
##   WifiYeslist SeatAngleLieFalatlist SeatClosedSuitelist SeatFlatBedlist
## 1           0                     0                   0               0
## 2           6                     0                   0               4
## 3          11                     4                   0               1
## 4           2                     2                   0               0
## 5           8                     1                   0               2
## 6           7                     0                   0               4
##   SeatReclinerlist SeatOpenSuitelist SeatStandardlist ClassLHBusinesslist
## 1                0                 0                9                   1
## 2                0                 0                6                   4
## 3                8                 0               12                   5
## 4                2                 0                4                   2
## 5                3                 0               16                   3
## 6               15                 5               31                  10
##   ClassLHEconomylist ClassLHFirstlist ClassPremEcolist ClassSHEconomylist
## 1                  1                0                0                  5
## 2                  4                0                0                  2
## 3                  5                0                1                  7
## 4                  2                0                0                  2
## 5                  4                0                2                 10
## 6                 11                0                6                 19
##   ClassSHFirst_businesslist
## 1                         2
## 2                         0
## 3                         7
## 4                         2
## 5                         3
## 6                         9
feqTab=feqTab[,3:25]
dt <- as.table(as.matrix (feqTab))

ress=CA(feqTab,graph = TRUE)

plot.CA(ress,invisible = "col",axes =3:4)

plot.CA(ress,invisible = "col",axes =1:2)

plot.CA(ress,invisible = "col",axes =c(1,4))

plot.CA(ress,invisible = "col",axes =c(1,3))

plot.CA(ress,invisible = "col",axes =c(2,4))

plot.CA(ress,invisible = "col",axes =c(2,3))

plot.CA(ress,invisible = "row",axes =3:4)

plot.CA(ress,invisible = "row",axes =1:2)

plot.CA(ress,invisible = "row",axes =c(1,4))

plot.CA(ress,invisible = "row",axes =c(1,3))

plot.CA(ress,invisible = "row",axes =c(2,4))

plot.CA(ress,invisible = "row",axes =c(2,3))

ress$col$contrib
##                                 Dim 1        Dim 2        Dim 3
## videoNonelist             19.32611585 7.868694e+00 3.745716e-02
## videoOnDemandlist         13.67766852 2.332672e-04 6.477920e-03
## videoOverHeadlist          1.34712561 1.203668e+01 6.184462e-01
## videoPortablelist          0.23205955 2.384432e-02 4.405033e+01
## videoSatellitelist         0.07164354 2.471761e+00 4.920229e+01
## videoSeatbacklist          0.03645092 8.981892e-01 1.201527e+00
## powerAllSeatslist         10.97952021 1.079499e+00 1.088006e-01
## powersomeSeatslist         0.88057202 1.754929e-01 3.385283e-04
## powernonelist             12.01893595 1.241006e+00 9.763545e-02
## WifiNolist                 1.14869126 2.049700e+01 6.372128e-02
## WifiYeslist                2.54681901 4.590084e+01 1.470905e-01
## SeatAngleLieFalatlist      1.04055223 2.231312e+00 2.058463e-02
## SeatClosedSuitelist        2.29653649 6.337368e-03 2.092256e-01
## SeatFlatBedlist            4.09543271 6.361641e-01 2.538286e-01
## SeatReclinerlist           0.18326268 3.741532e-02 1.803515e+00
## SeatOpenSuitelist          1.03096763 9.951732e-02 1.055514e-04
## SeatStandardlist           3.68014207 4.117455e-01 1.622015e-01
## ClassLHBusinesslist        4.14343230 1.115904e+00 2.647042e-01
## ClassLHEconomylist         2.54404229 1.929653e-01 1.878562e-01
## ClassLHFirstlist           3.99179160 8.705174e-01 1.079273e-03
## ClassPremEcolist           0.85542478 4.116558e-04 1.141408e+00
## ClassSHEconomylist        13.27884317 2.176839e+00 1.342732e-02
## ClassSHFirst_businesslist  0.59396962 2.764007e-02 4.079429e-01
##                                  Dim 4      Dim 5
## videoNonelist             1.123396e+00 17.5088192
## videoOnDemandlist         3.529154e-01  0.6477534
## videoOverHeadlist         2.800061e+00 44.7035374
## videoPortablelist         4.325153e+01  1.2288282
## videoSatellitelist        4.229722e+01  2.0356605
## videoSeatbacklist         6.025549e-01  0.3136155
## powerAllSeatslist         2.463480e-01  0.1983815
## powersomeSeatslist        2.605507e+00  2.7440790
## powernonelist             7.689015e-01  0.7009572
## WifiNolist                1.486931e+00  4.1987087
## WifiYeslist               3.314637e+00  9.3716958
## SeatAngleLieFalatlist     5.601429e-02  0.3868666
## SeatClosedSuitelist       4.560790e-01  2.6036452
## SeatFlatBedlist           3.866178e-04  1.1137736
## SeatReclinerlist          7.144350e-02  4.3863323
## SeatOpenSuitelist         2.146411e-01  0.1074232
## SeatStandardlist          1.016360e-02  0.1240526
## ClassLHBusinesslist       7.150024e-02  0.1788201
## ClassLHEconomylist        3.202444e-04  0.2655740
## ClassLHFirstlist          1.277713e-01  2.0785023
## ClassPremEcolist          2.578266e-02  0.3190988
## ClassSHEconomylist        7.949847e-02  0.1143678
## ClassSHFirst_businesslist 3.639562e-02  4.6695072

the characteristics of the average profile are all around the origin, by examining the coordinates of the variables, we found that the average global profile is summarized as follows:

Seat:recliner

Wfi:No_wifi

Power :some seats

video:Portable TV satellite TV et seatback tv

classe : Premium economy , first class , business class

Modeling

#ACP
library(FactoMineR)
result_pca <- PCA(data[,2:33], graph = FALSE, ncp=3)
result_pca
## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 44 individuals, described by 32 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"
plot(result_pca)

library("factoextra")
eig.val <- get_eigenvalue(result_pca)
eig.val
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  1.349103e+01     4.215946e+01                    42.15946
## Dim.2  7.164241e+00     2.238825e+01                    64.54772
## Dim.3  4.917618e+00     1.536756e+01                    79.91527
## Dim.4  1.499489e+00     4.685902e+00                    84.60117
## Dim.5  1.330906e+00     4.159082e+00                    88.76026
## Dim.6  1.053693e+00     3.292790e+00                    92.05305
## Dim.7  7.891997e-01     2.466249e+00                    94.51930
## Dim.8  5.070555e-01     1.584548e+00                    96.10384
## Dim.9  3.499964e-01     1.093739e+00                    97.19758
## Dim.10 2.859749e-01     8.936716e-01                    98.09125
## Dim.11 1.700615e-01     5.314422e-01                    98.62270
## Dim.12 1.341398e-01     4.191870e-01                    99.04188
## Dim.13 9.131570e-02     2.853616e-01                    99.32725
## Dim.14 5.817774e-02     1.818054e-01                    99.50905
## Dim.15 4.008013e-02     1.252504e-01                    99.63430
## Dim.16 3.157443e-02     9.867009e-02                    99.73297
## Dim.17 2.133028e-02     6.665711e-02                    99.79963
## Dim.18 1.960192e-02     6.125600e-02                    99.86088
## Dim.19 1.406561e-02     4.395504e-02                    99.90484
## Dim.20 1.113021e-02     3.478190e-02                    99.93962
## Dim.21 7.443154e-03     2.325986e-02                    99.96288
## Dim.22 6.277314e-03     1.961661e-02                    99.98250
## Dim.23 2.689596e-03     8.404989e-03                    99.99090
## Dim.24 2.044121e-03     6.387877e-03                    99.99729
## Dim.25 4.953655e-04     1.548017e-03                    99.99884
## Dim.26 2.394094e-04     7.481544e-04                    99.99959
## Dim.27 1.188629e-04     3.714464e-04                    99.99996
## Dim.28 1.338488e-05     4.182776e-05                   100.00000
## Dim.29 3.621700e-30     1.131781e-29                   100.00000
## Dim.30 1.836758e-30     5.739870e-30                   100.00000
## Dim.31 1.004447e-31     3.138896e-31                   100.00000
## Dim.32 1.004447e-31     3.138896e-31                   100.00000
fviz_eig(result_pca, addlabels = TRUE, ylim = c(0, 50))

fviz_pca_var(result_pca, col.var = "cos2",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Évite le chevauchement de texte,
             
)

library("rgl")
## Warning: package 'rgl' was built under R version 3.4.3
plot3d(x=result_pca$ind$coord[,"Dim.1"],y=result_pca$ind$coord[,"Dim.2"],z=result_pca$ind$coord[,"Dim.3"],xlab = "dim1",choix = "ind",ylab = "dim2",zlab = "dim3")

CAH clustering taking into account outliers

#centrage et réduction
data_scale=scale(data[,c(2:33)])
d=dist(data_scale,method="manhattan")
cah <- hclust(d, method="ward.D2")
barplot(cah$height) 

#visualisation du methode
plot(cah) 

groupes.cah <- cutree(cah,k=4 )
plot(groupes.cah)

table(groupes.cah)
## groupes.cah
##  1  2  3  4 
## 23  1 16  4
barplot(table(groupes.cah))

#segmentation n'est pas équitable
fviz_cos2(result_pca, choice = "var", axes = 1:2)

inertie <- sort(cah$height, decreasing = TRUE)
plot(inertie[1:10], type = "s", xlab = "Nombre de classes", 
     ylab = "Inertie")
points(c(2, 3, 4,6), inertie[c(2, 3,4,6)], col = c("green3", 
                                               "red3", "blue3","yellow3"), cex = 2, lwd = 3)

plot(cah, labels = FALSE, main = "Partition en 2, 3,4 ou 5 classes", 
     xlab = "", ylab = "", sub = "", axes = FALSE, hang = -1)
rect.hclust(cah, 2, border = "green3")
rect.hclust(cah, 3, border = "red3")
rect.hclust(cah, 4, border = "blue3")
rect.hclust(cah, 6, border = "yellow3")

library(devtools)
## Warning: package 'devtools' was built under R version 3.4.3
library(JLutils)
## Loading required package: plyr
best.cutree(cah)
## [1] 4
#the best.cutree function looks what would be the best partition between 2, 3,4 and 6 classes
#The partition into 4 classes will be best depending on the result.
best.cutree(cah, graph = TRUE, xlab = "Nombre de classes", 
            ylab = "Inertie relative")

## [1] 4
library(RColorBrewer)
A2Rplot(cah, k = 4, boxes = FALSE, col.up = "gray50", 
        col.down = brewer.pal(4, "Dark2"), show.labels = FALSE)

library(pvclust)
## Warning: package 'pvclust' was built under R version 3.4.3
fit <- pvclust(data_scale, method.hclust="ward.D2",
               method.dist="euclidean")
## Bootstrap (r = 0.5)... Done.
## Bootstrap (r = 0.59)... Done.
## Bootstrap (r = 0.68)... Done.
## Bootstrap (r = 0.8)... Done.
## Bootstrap (r = 0.89)... Done.
## Bootstrap (r = 1.0)... Done.
## Bootstrap (r = 1.09)... Done.
## Bootstrap (r = 1.18)... Done.
## Bootstrap (r = 1.3)... Done.
## Bootstrap (r = 1.39)... Done.
plot(fit) # dendogram with p values
# add rectangles around groups highly supported by the data
pvrect(fit, alpha=.95)

#clustering en éliminant les points abbérants

data_scale_spab=scale(data[-2,c(2:33)])

d1=dist(data_scale_spab,method="manhattan")

cah2 <- hclust(d1, method="ward.D2")
plot(cah2) 

groupes.cah <- cutree(cah2,k=3)
plot(groupes.cah)

table(groupes.cah)
## groupes.cah
##  1  2  3 
## 24 15  4
barplot(table(groupes.cah))

#segmentation n'est pas équitable

HCPC (CAH+ACP)

The HCPC function (Hierarchical Classification on Principal Components) allows unsupervised classification of individuals. This function combines the main factors, heroic classification and partitioning to better visualize and emphasize the similarities between individuals. HCPC performs both the calculation of the distance matrix, the dendrogram and the partitioning of the population into classes.

plot(cah2) 

res <- HCPC(result_pca,graph = FALSE)
plot(res, choice = "tree" )

#donner Graphique 3D combinant la classification hiérarchique et le plan des facteurs(visualisation)
# Principal components + tree
plot(res, choice = "3D.map")

plot(res, choice = "bar")

plot(res, choice = "map")

fviz_dend(res, 
          cex = 0.7,                     # Taille du text
          palette = "jco",               # Palette de couleur ?ggpubr::ggpar
          rect = TRUE, rect_fill = TRUE, # Rectangle autour des groupes
          rect_border = "jco",           # Couleur du rectangle
          labels_track_height = 0.8 )

#le dendograme suggère une solution de 4 groupes
fviz_cluster(res,
            repel = TRUE,            # Evite le chevauchement des textes
            show.clust.cent = TRUE, # Montre le centre des clusters
            palette = "jco",         # Palette de couleurs, voir ?ggpubr::ggpar
            ggtheme = theme_minimal(),
            main = "Factor map"
)

K-means

K-Means clustering taking into account outliers with 4 clusters

fit <- kmeans(data_scale, 4)
# vary parameters for most readable graph

library(cluster) 
clusplot(data_scale, fit$cluster, color=TRUE, shade=TRUE, 
         labels=2, lines=0)

groupes.kmeans <- kmeans(data_scale,centers=3,nstart=4)
#affichage des résultats
print(groupes.kmeans)
## K-means clustering with 3 clusters of sizes 1, 15, 28
## 
## Cluster means:
##   Seat.Width..Club. Seat.Pitch..Club. Seat..Club. Seat.Width..First.Class.
## 1         6.4824939         6.4824939   6.4824939                0.7791285
## 2        -0.1507557        -0.1507557  -0.1507557               -1.1314261
## 3        -0.1507557        -0.1507557  -0.1507557                0.5782951
##   Seat.Pitch..First.Class. Seats..First.Class. Seat.Width..Business.
## 1                0.9255149           1.6159379             1.3315294
## 2               -1.1144428          -0.9762958             1.3051387
## 3                0.5639688           0.4653035            -0.7467361
##   Seat.Pitch..Business. Seats..Business. Seat.Width..Eco.Comfort.
## 1             0.9543069        0.1267015               -2.7534624
## 2             1.2971462        1.3049355                0.3890878
## 3            -0.7289822       -0.7035977               -0.1101019
##   Seat.Pitch..Eco.Comfort. Seats..Eco.Comfort. Seat.Width..Economy.
## 1               -2.7580545          -1.8992508           -6.3970861
## 2                0.3986931           1.0195399            0.2295612
## 3               -0.1150837          -0.4783517            0.1054882
##   Seat.Pitch..Economy. Seats..Economy. Accommodation Cruising.Speed..mph.
## 1           -6.4538472      -2.1723401    -1.4905983           0.04441488
## 2            0.1466345       0.8188510     0.9086150           0.45215440
## 3            0.1519403      -0.3610866    -0.4335224          -0.24381181
##   Range..miles.    Engines Wingspan..ft. Tail.Height..ft. Length..ft.
## 1    -0.3284213 -0.1507557    -0.4796928       -0.3250485  -0.9664471
## 2     1.1303122  0.2914610     1.0549468        0.9926835   1.0074765
## 3    -0.5937951 -0.1507557    -0.5480182       -0.5201858  -0.5052036
##         Wifi      Video      Power  Satellite   Flat.bed    Sleeper
## 1  0.8618163  0.7844233 -1.1339688 -0.3928183 -0.5012970 -0.3928183
## 2 -0.8678641  0.7844233  0.8618163 -0.2007738  0.9691742  0.5674042
## 3  0.4341480 -0.4482419 -0.4211884  0.1215866 -0.5012970 -0.2899373
##         Club First.Class   Business Eco.Comfort
## 1  6.4824939   0.9024378  1.2458487 -2.76092870
## 2 -0.1507557  -1.0829253  1.2458487  0.35396522
## 3 -0.1507557   0.5479086 -0.7119136 -0.09101963
## 
## Clustering vector:
##  [1] 3 1 3 3 2 2 2 2 3 3 3 3 3 2 3 2 3 3 3 3 2 3 2 2 3 3 2 3 2 2 2 2 2 3 3
## [36] 3 3 3 3 3 3 3 3 3
## 
## Within cluster sum of squares by cluster:
## [1]   0.000 150.626 417.797
##  (between_SS / total_SS =  58.7 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"

method validation

elbow method

One method to validate the number of clusters is the elbow method. The idea of the elbow method is to run k-means clustering on the dataset for a range of values of k (say, k from 1 to 10 in the examples above), and for each value of k calculate the sum of squared errors (SSE)

fviz_nbclust(data_scale, kmeans, method = "wss") +
  geom_vline(xintercept = 4, linetype = 2)+
  labs(subtitle = "Elbow method")

Silhouette method

fviz_nbclust(data_scale, kmeans, method = "silhouette")+
  labs(subtitle = "Silhouette method")

Calinsky criterion

library(fpc)
## Warning: package 'fpc' was built under R version 3.4.3
## 
## Attaching package: 'fpc'
## The following object is masked from 'package:dbscan':
## 
##     dbscan
sol.kmeans <- kmeansruns(data_scale,krange=2:10,criterion="ch")
plot(1:10,sol.kmeans$crit,type="b",xlab="Nb. de groupes",ylab="Silhouette")

#From k = 4 clusters, the addition of an additional group does not "significantly" increase the share of inertia explained by the partition.